1. Describing Scenario

1.1 Background

You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat’s marketing strategy.


1.2 Business Task

Analyse the dataset to derive some meaningful insights and recommend some measures by which marketing team can imporve their marketing strategy.


1.3 Objectives

  • What are some trends in smart device usage?

  • How could these trends apply to Bellabeat customers?

  • How could these trends help influence Bellabeat marketing strategy?


1.4 Stakeholders

  • Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer

  • Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team

  • Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.


2. Preparation of Data

2.1 Information about dataset

Sršen encourages you to use public data that explores smart device users’ daily habits. She points you to a specific data set:

FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and sleep that can be used to explore users’ habits.


2.3 Limitations of Dataset

  • Lack of transparency in terms of data collection

  • Outdated data (the survey was in 2016) might be the wrong choice for developing a new product in 2021

  • Sample size: Only 33 users and short observation period of one month (Möbius dataset) is available as compared to two months (original dataset)


3. Processing of Data

3.1 Cleaning and modifying data

  • Counting Duplicate Values in Daily Activity Table
SELECT Id, ActivityDate,
    COUNT(*) AS NumberOfRows
FROM bellabeat.dailyactivity
GROUP BY ID, ActivityDate
HAVING NumberOfRows > 1;

OUTPUT : Empty table because there are no duplicates


  • Counting Duplicate Values in Sleep Log Table
SELECT *,
    COUNT(*) AS NumberOfRows
FROM bellabeat.Sleeplog
GROUP BY Id, SleepDay, TotalSleepRecords, TotalTimeInBed, TotalMinutesAsleep
HAVING NumberOfRows > 1;
Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed NumberOfRows
4388161847 2016-05-05 1 471 495 2
4702921684 2016-05-07 1 520 543 2
8378563200 2016-04-25 1 388 402 2

  • Creating a second Sleep Log Table with no duplicates
CREATE TABLE SleepLog2
SELECT DISTINCT * FROM SleepLog

  • Counting Duplicate Values in weigth Log Table.
SELECT Id, Date,
COUNT(*) AS NumberOfRows
FROM bellabeat.weightlog
GROUP BY Id, Date
HAVING NumberOfRows > 1;

OUTPUT : Empty table because there are no duplicates


  • Checking if there any wrong Id in Daily Activity,Sleep Log and Weight Log Table.
#Checking if there any wrong Id in Daily Activity
SELECT Id
FROM bellabeat.dailyactivity
WHERE LENGTH(Id) > 10
OR LENGTH(Id) < 10

#Checking if there any wrong Id in Sleep Log
SELECT Id
FROM bellabeat.sleeplog2
WHERE LENGTH(Id) > 10
OR LENGTH(Id) < 10

#Checking if there any wrong Id in Weight Log
SELECT Id
FROM bellabeat.weightlog
WHERE LENGTH(Id) > 10
OR LENGTH(Id) < 10;

OUTPUT : All three queries returned empty table because there are no wrong ID’s.


4. Anlaysis Phase

4.1 Summary Statistics of Dataset

SELECT 'TOTAL',
    ROUND(SUM(TotalSteps),2) AS Steps,
    ROUND(SUM(TotalDistance),2) AS Distance,
    ROUND(SUM(Calories),2) AS Calories
FROM bellabeat.dailyactivity
UNION
SELECT 'AVERAGE',
    ROUND(AVG(TotalSteps),2) AS Steps,
    ROUND(AVG(TotalDistance),2) AS Distance,
    ROUND(AVG(Calories),2) AS Calories
FROM bellabeat.dailyactivity
UNION
SELECT 'MIN',
    MIN(TotalSteps),
    MIN(TotalDistance),
    MIN(Calories)
FROM bellabeat.dailyactivity
UNION
SELECT 'MAX',
    ROUND(MAX(TotalSteps),2),
    ROUND(MAX(TotalDistance),2),
    ROUND(MAX(Calories),2)
FROM bellabeat.dailyactivity;
METRIC Steps Distance Calories
TOTAL 7179636 5160.32 2165393.00
AVERAGE 7637.91 5.49 2303.61
MIN 0 0 0.00
MAX 36019 28.03 4900.00

4.2 Daywise Average Metrics

SELECT
    DAYNAME(ActivityDate) AS DayOfWeek,
    ROUND(AVG(TotalSteps),2) AS AvgSteps,
    ROUND(AVG(TotalDistance),2) AS AvgDistance,
    ROUND(AVG(Calories),2) AS AvgCalories,
    ROUND(AVG(TotalMinutesAsleep)/60,2) AS Average_Sleep_Hours
FROM
    bellabeat.dailyactivity AS Act
LEFT JOIN
    bellabeat.sleeplog2 AS Sleep
ON
    Act.ActivityDate = Sleep.SleepDay
GROUP BY DayOfWeek
ORDER BY
    CASE
          WHEN DayOfWeek = 'Monday' THEN 1
          WHEN DayOfWeek = 'Tuesday' THEN 2
          WHEN DayOfWeek = 'Wednesday' THEN 3
          WHEN DayOfWeek = 'Thursday' THEN 4
          WHEN DayOfWeek = 'Friday' THEN 5
          WHEN DayOfWeek = 'Saturday' THEN 6
          WHEN DayOfWeek = 'Sunday' THEN 7
    END ASC;
DayOfWeek AvgSteps AvgDistance AvgCalories Average_Sleep_Hours
Monday 7750.91 5.53 2322.53 6.98
Tuesday 8115.02 5.83 2355.89 6.75
Wednesday 7593.5 5.51 2308.71 7.24
Thursday 7635.67 5.48 2266.94 6.67
Friday 7449.29 5.3 2330.70 6.76
Saturday 8176.29 5.87 2357.84 6.99
Sunday 6943.2 5.04 2262.51 7.56

4.3 Analysing Type of Sleepers

#Categorising Sleeper type
CREATE TABLE sleeper_type AS(
SELECT
    Id,
    ROUND(AVG(TotalMinutesAsleep)/60,2) AS Daily_Sleep,
    CASE
        WHEN ROUND(AVG(TotalMinutesAsleep)/60,2) < 4
        THEN 'Unhealthy Sleeper'
        WHEN ROUND(AVG(TotalMinutesAsleep)/60,2) >= 4 AND ROUND(AVG(TotalMinutesAsleep)/60,2)< 6
        THEN 'Bad Sleeper'
        WHEN ROUND(AVG(TotalMinutesAsleep)/60,2) >= 6 AND ROUND(AVG(TotalMinutesAsleep)/60,2)< 7
        THEN 'Normal Sleeper'
    ELSE 'Good Sleeper'
    END AS Sleeper_type
FROM bellabeat.sleeplog2
GROUP BY id);

#Determining their contribustion in our dataset
SELECT
    Sleeper_type,
    COUNT(Sleeper_type)/(SELECT COUNT(*) FROM sleeper_type)*100 AS Percent
FROM bellabeat.sleeper_type
GROUP BY Sleeper_type
ORDER BY Sleeper_type;
Sleeper_type Percent
Bad Sleeper 20.8333
Good Sleeper 45.8333
Normal Sleeper 20.8333
Unhealthy Sleeper 12.5000

4.4 Analysing weight condition

#Categorising Weight Status
CREATE TABLE weight_status AS(
SELECT
    Id,
    ROUND(AVG(WeightKg),2) AS AverageWeight,
    ROUND(AVG(BMI),2) AS AverageBMI,
    CASE
        WHEN ROUND(AVG(BMI),2) < 18.5
        THEN 'Underweight'
        WHEN ROUND(AVG(BMI),2) >= 18.5 AND ROUND(AVG(BMI),2) < 24.9
        THEN 'Healthy Weight'
        WHEN ROUND(AVG(BMI),2) >= 25.0 AND ROUND(AVG(BMI),2) < 29.9
        THEN 'Overweight'
    ELSE 'Obese'
    END AS Weight_Status
FROM bellabeat.weightlog
GROUP BY id);
#Determining their contribustion in our dataset
SELECT
    Weight_Status,
    COUNT(Weight_Status)/(SELECT COUNT(*) FROM weight_status)*100 AS Percent
FROM bellabeat.weight_status
GROUP BY Weight_Status
ORDER BY Weight_Status;
Weight_Status Percent
Healthy Weight 37.5000
Obese 12.5000
Overweight 50.0000

4.5 Analysing user type

#Categorising User type
CREATE TABLE ProductWorn AS (
SELECT
  Id,
  VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes AS TotalMiniutesWorn,
  ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2) AS Percent,
  CASE
  WHEN ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2) >= 50 AND
      ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2)< 70
  THEN 'light User'
  WHEN ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2) >= 70 AND
      ROUND((VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes)/1440*100,2)< 90
  THEN 'Moderate User'
  ELSE 'High User'
  END AS User_type
FROM bellabeat.dailyactivity
GROUP BY Id);
#Determining their contribustion in our dataset
SELECT
    User_type,
    COUNT(User_type)/(SELECT COUNT(*) FROM productworn)*100 AS Percent
FROM bellabeat.productworn
GROUP BY User_type
ORDER BY User_type;
User_type Percent
High User 60.6061
light User 27.2727
Moderate User 12.1212

4.6 Analysing time taken to Sleep

SELECT
  Id,
  ROUND(AVG(TotalTimeInBed-TotalMinutesAsleep),2) AS TimeDiffMins,
  ROUND(AVG(TotalTimeInBed-TotalMinutesAsleep)/TotalTimeInBed*100,2) AS Percent
FROM bellabeat.sleeplog2
GROUP BY Id
ORDER BY Percent DESC

Here are the Top 10 users who take maximum time to sleep

Id TimeDiffMins Percent
1644430081 52.00 40.94
3977333714 167.50 35.71
1844505072 309.00 32.15
2320127002 8.00 11.59
8378563200 40.81 11.46
5553957443 42.39 9.13
4558609924 12.40 9.05
2347167796 44.53 8.38
6117666160 31.39 7.88
6775888955 19.33 7.43

5. Conclusion

There were some limitations in the data set like there was 24 unique ID in the Sleep data and only 8 IDs in the Weight data set but I tried to derive some insights from it.

5.1 Insights

  • The highest average step count and calorie intake are on Tuesday and Saturday. They hit their lowest on Friday and Sunday.

  • The average step count is below 10000 steps which means people are not able to achieve recommended steps of 10000 each day.

  • Only 45% of the people are good sleepers who are having around 7-8 hours of sleep. 20% of the people are bad sleepers with 4-6 hours of sleep. Just like bad sleepers, 20% of the people are normal sleepers with average sleep of 6-7 hours. 12.5% of people are unhealthy sleepers with less than 4 hours of sleep daily.

  • 50% of people are overweight with a BMI between 25-29.9. 37.5% of people are in the category of healthy weight with a BMI between 18.5-24.5. 12.5% of people are suffering from obesity with BMI greater than 29.9.

  • 60% of the users wear their wearables almost all day. 27% of people are lights users who wear their wearables around 50-70% part of their day. 12% are moderate users who wear their wearables around 70-90% part of their day.

  • Sleep data shows 24 Ids out of 33 Ids which means people might don’t wear their wearables at night due to reasons like comfort, notifications etc.

  • Some users take around 30-40% time of their total time in bed to sleep which means they keep using their phone or do some kind of work after going to the bed.


5.2 Suggestions

  • We can gamify our UI such as receiving points for meeting or exceeding daily step goals, sleeping hours, and so on. Once enough points have been acquired, then the user can exchange them for a discount coupon to get some off on a future wearable purchase.

  • As we have seen that some users don’t wear their wearables at night and this issue can be solved if we market our product in such a way that "It has a comfortable strap and lightweight main unit".

  • We can collaborate with some popular apps like HealthifyMe and My Fitness Pal to track their customer data and use it for our purpose because the dataset that we have used in this study is old and not complete. We can run our marketing campaigns on their apps too.

  • We can do promotional activities in corporate offices and universities like we can create awareness programs for the staff and students that “How a healthy lifestyle can lead to a better and long life” and How Bellabeat wearables can help you achieve that lifestyle.